Proactive and reactive multi - dimensional histogram maintenance for selectivity estimation q
نویسندگان
چکیده
Many state-of-the-art selectivity estimation methods use query feedback to maintain histogram buckets, thereby using the limited memory efficiently. However, they are ‘‘reactive’’ in nature, that is, they update the histogram based on queries that have come to the system in the past for evaluation. In some applications, future occurrences of certain queries may be predicted and a ‘‘proactive’’ approach can bring much needed performance gain, especially when combined with the reactive approach. For these applications, this paper provides a method that builds customized proactive histograms based on query prediction and mergers them into reactive histograms when the predicted future arrives. Thus, the method is called the proactive and reactive histogram (PRHist). Two factors affect the usefulness of the proactive histograms and are dealt with during the merge process: the first is the predictability of queries and the second is the extent of data updates. PRHist adjusts itself to be more reactive or more proactive depending on these two factors. Through extensive experiments using both real and synthetic data and query sets, this paper shows that in most cases, PRHist outperforms STHoles, the state-of-the-art reactive method, even when only a small portion of the queries are predictable and a significant portion of data is updated. 2007 Published by Elsevier Inc. E
منابع مشابه
Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation
Many state-of-the-art selectivity estimation methods use query feedback to maintain histogram buckets, thereby using the limited memory efficiently. However, they are “reactive” in nature, that is, they update the histogram based on queries that have come to the system in the past for evaluation. In some applications, future occurrences of certain queries may be predicted and a “proactive” appr...
متن کاملEfficient Selectivity Estimation by Histogram Construction Based on Subspace Clustering
Modern databases have to cope with multi-dimensional queries. For efficient processing of these queries, query optimization relies on multi-dimensional selectivity estimation techniques. These techniques in turn typically rely on histograms. A core challenge of histogram construction is the detection of regions with a density higher than the ones of their surroundings. In this paper, we show th...
متن کاملDigitHist: a Histogram-Based Data Summary with Tight Error Bounds
We propose DigitHist, a histogram summary for selectivity estimation on multi-dimensional data with tight error bounds. By combining multi-dimensional and one-dimensional histograms along regular grids of different resolutions, DigitHist provides an accurate and reliable histogram approach for multi-dimensional data. To achieve a compact summary, we use a sparse representation combined with a n...
متن کاملIntegrating Query-Feedback Based Statistics into Informix Dynamic Server
Statistics that accurately describe the distribution of data values in the columns of relational tables are essential for effective query optimization in a database management system. Manually maintaining such statistics in the face of changing data is difficult and can lead to suboptimal query performance and high administration costs. In this paper, we describe a method and prototype implemen...
متن کاملA Histogram Utilizing the Cluster Information
Histograms are summary structures of large datasets, which are mainly used for selectivity estimation during query optimization. Selectivity estimation is the fast approximation of query result size. In this paper, we focus on multi-dimensional histograms, especially bidimensional histograms. At the time of selectivity estimation, buckets partially overlapping with a query return approximated r...
متن کامل